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1. INTRODUCTION: 


Posttraumatic stress (PTS) is among the most common mental health problems among Service 
Members and Veterans returning from recent deployments, and despite the availability of 
evidence-based treatments (EBT), many of those with mental health problems do not seek or 
postpone seeking EBT for PTS, defined here as Prolonged Exposure Therapy (PE) and/or 
Cognitive Processing Therapy (CPT). This study will improve clinical care for the large number of 
Warfighters and Veterans who suffer from PTS by determining if Veterans receiving EBT outside 
of research trials actually demonstrate PTS symptom improvement in a clinical setting, whether 
these EBTs for PTS impact suicidality in a clinical setting, what factors are associated with PTS 
symptom improvement in those that benefit from EBT, and whether all Veterans benefit equally 
from EBTs for PTS. 

2. KEYWORDS: 

Posttraumatic stress 
Cognitive Processing Therapy 
Prolonged Exposure Therapy 
Evidence based treatment 

3. ACCOMPLISHMENTS: 

■ What were the major goals of the project? 

The major goals of this project are to determine the effectiveness of evidence-based therapies for 
posttraumatic stress (PTS) applied naturalistically in a clinical setting; factors associated with PTS 
symptom improvement; and optimal treatment trajectories for Veterans with PTS and complex 
comorbidities. 

The project work has been divided into four main tasks: 

TASK 1. Update and Merge Existing Data and Datasets—100% completed 
TASK 2. Use NLP to Evaluate Clinical Notes—37.5% completed. 

TASK 3. Data Analysis—0% completed 

TASK 4. Finalize study requirements, prepare for future funding, and dissemination of findings—0% 
completed 
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Progress on subtasks is described in detail in the following section. 


■ What was accomplished under these goals? 

To date, several milestones identified in the SOW were achieved to support the major goals. The 
following are subtasks that have been achieved as specified in Task 1 (Update and merge existing data 
and datasets): IRB protocols were approved (subtask la), scientific approvals were obtained (subtask 
lb),study staff were hired (subtask lc), VA OEF/OIF/OND Roster was obtained from VA Central Office 
(subtask Id), Roster was merged with VA National Patient Care Database and Corporate Data 
Warehouse (CDW) (subtasks lg, le), study sample was identified based on eligibility criteria (subtask If). 
This subtask (If) also included checking for quality of the record linkage, which was an important quality 
control measure. We have also obtained updated PCL measures and suicide measures (subtask lg), and 
identified those with at least 3 PCL measures within 18-weeks and at least two suicide measures. 

Further, we identified all Veterans with a PCL at baseline and at least one other PCL within the five years 
of seeking mental healthcare, and suicide screens at two time points also within the first five years 
(subtask lk). Based on the sample identified in task If, subtask li has been modified to include all 
psychotherapy notes for all PTS visits. The reason for this is that to account for potential bias, we need 
to determine the propensity for treatment regardless of level of utilization, or documentation of mental 
health measures/symptoms. 

We received approval from ORD and NDS to access the national data sources needed for this project. 

We were then able to work with VINCI staff to create views of national data sources based on linkage to 
the Veterans listed in the OEF/OIF/OND Roster. We built a preliminary cohort of OEF/OIF/OND Veterans 
who had a diagnosis of PTS after the end of their last deployment (N=308,556; subtask If). This PTS+ 
cohort had a mean age of 35.9 years (SD=8.8 years), was 89.6% male, and was comprised of 57.5% 
white, 13.9% black, 11.4% Hispanic, 3.9% other and 13.9% unknown. 

We identified all outpatient and inpatient PTS psychotherapy visits made by these Veterans in the study 
period: there were total of 5,693,721 visits, with a median of 9 visits per patient (IQR=20). Ninety-five 
percent of these PTS psychotherapy visits linked directly to at least one clinical note (a corpus of 
7,841,634 notes in total; subtasksli, expanded to include all PTS+ cohort from subtask If and subtask 
2a). The clinical notes for these visits will be analyzed using natural language processing (NLP) to classify 
each visit as evidence-based therapy (expanded to include CPT, PE, group CPT, and group PE), other 
psychotherapy (individual or group), or not psychotherapy. Building a classifier with good performance 
requires using human-annotated notes (gold standard) to train and test different machine learning 
algorithms. 

We completed a standardized annotation guide specifying each category of psychotherapy to be 
identified. From our corpus of approximately 8 million psychotherapy notes (subtask 2a) we randomly 
selected 5 documents from each of 130 VA facilities (650 documents). Using the standardized 
annotation guide, the notes were doubly annotated (task 2c) within each of two sites (SF and SLC) and 
adjudication was performed at each site (task 2d). The clinician annotation team (SF) had a high level of 
agreement (K=0.92), and the professional annotation team (SLC) reached fair level of agreement 
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(K=0.79). The between site agreement was good (K=.84). The SLC team then reviewed all cases where 
they disagreed with the SF team's adjudicated judgements and collaborated to improve the annotation 
guide for use in subsequent rounds. Consistent without our pilot data, our first round of annotation 
showed that half of the documents were irrelevant (not psychotherapy) and that some of the key 
evidence-based therapy classes contained an insufficient number of documents for algorithm creation. 
Given this finding, we have taken advantage of improved methods of performing large scale NLP 
developed at the SLC site. 

To enrich our corpus, we created an NLP system to filter out the “Not psychotherapy" notes for ongoing 
rounds of annotation. Using a training set of notes from the first round of annotation that all four 
reviewers agreed were "Not Psychotherapy," we used a bag of words and support vector machine (SVM) 
approach to generate a classifier for irrelevant notes that has a classification accuracy (TP+TN/Total) of 
88%. This has allowed us to remove irrelevant documents and generate an enriched pool of 
approximately 4 million psychotherapy notes. This will greatly increase the efficiency of our annotation 
process. We will no longer be using ARC (task 2b), and have instead switched to a semi-supervised 
platform that has improved machine learning performance and fewer systematic delays, as 
recommended by Drs. DuVall and Patterson at the SLC site. This platform will also vastly improve the 
speed of document classification once the NLP system is finalized. 

Also during this period, we identified 30,884 (10%) Veterans in our PTS cohort with at least one instance 
of three (or more) PCLs within an 18 week period (subtask lh). We compared these Veterans to those 
members of the PTS cohort without that level of data on demographic and military characteristics (date 
of last deployment, age, gender, race, rank, and number of deployments) and found no substantial 
differences. We aligned the PCL data with the patients' outpatient psychotherapy billed visits in order 
to get an estimate of number of psychotherapy visits with corresponding (within one week) PCL data 
(task lj, lk). The subset of 30,529 PTS patients who had both psychotherapy sessions and PCL data had a 
total of 1.51 million visits, with a median of 26 (IQR=12-53) psychotherapy visits per patient; and, 64,789 
(4.3%) of these visits had a proximal PCL. We identified 101,761 (33% of PTS cohort) members of the PTS 
cohort with two or more suicide screenings during the post-deployment period (subtask lh). We 
compared these Veterans to those with only 1 or no suicide screens on demographic and military 
characteristics (date of last deployment, age, gender, race, rank, and number of deployments) and 
found no substantial differences. There were 11,799 Veterans with a PCL at baseline and at least one 
other PCL within the five years of seeking mental healthcare, and suicide screens at two time points also 
within the first five years (task lk). 

During this period it became clear that ARC was not the tool best suited to meet our research goals 
(Task 2b: "Use the Automated Retrieval Console (ARC) algorithm designed to identify CPT and PE 
notes"). After consultation with the team, we have moved to an updated NLP platform that will 
improve performance and efficiency. Our team's deep expertise in informatics has allowed us to 
incorporate better and faster technologies into our work to as we encounter obstacles. This is a strength 
our multi-disciplinary research approach. 


6 



■ What opportunities for training and professional development has the project 
provided? 

Nothing to report (not a goal of this study). 

■ How were the results disseminated to communities of interest? 

Nothing to report (not yet at dissemination phase). 

■ What do you plan to do during the next reporting period to accomplish the 
goals? 

Currently, the SF and SLC annotation teams are working through another 650 documents from the 
enriched corpus using the improved annotation guide. If agreement within and between sites is high 
and sufficient numbers of each psychotherapy class are identified, subsequent rounds of annotation will 
be completed by the SLC annotation team only, with consultation from the clinical experts on the SF 
annotation team as needed. We anticipate that this second round of annotation will be completed in 
the next two weeks. As rounds of annotation are completed, Drs. DuVall and Patterson will create 
evidence-based psychotherapy classifiers, asking for additional rounds of annotation when performance 
is limited by insufficient examples. We anticipate finalization of the evidence-based psychotherapy 
classifier and application of the classifier to the corpus by the end of next quarter. 


4. IMPACT: 

■ What was the impact on the development of the principal discipline(s) of the 
project? 

Nothing to report. 

■ What was the impact on other disciplines? 

Nothing to report. 

■ What was the impact on technology transfer? 

Nothing to report. Once the project is complete we will be able to share our algorithms 
to further research in this area. 
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■ What was the impact on society beyond science and technology? 

Nothing to report. 

5. CHANGES/PROBLEMS: 

There were some delays in acquiring the OEF/OIF/OND Roster, but this issue has been resolved and we 
are still on track, given that we left extra time for this task. 

Our first round of annotation showed that the majority of documents were irrelevant (not 
psychotherapy) and that some of the key evidence-based therapy classes had too few documents to 
learn from. Therefore, we developed an NLP classifier to remove irrelevant documents in order to enrich 
our text corpus. It became clear that ARC was not tool best suited to our research task. Therefore, we 
have moved to an updated NLP platform that will improve performance and efficiency. 

Poor computing infrastructure in VINCI caused delays in creation of database of psychotherapy visits 
and corresponding clinical notes as well as delays in extracting and cleaning mental health measures 
data (PCL and suicide screens). Given significant delays in VINCI Windows desktop, we transferred the 
bulk of analytical work to the Vinci (Linux) SAS/GRID. In addition, we received permission to access to 
our Vinci extracts from a secure local server at the SFVA, which provides an alternate system for data 
processing and analysis in the event of future problems on the Vinci desktop. 

6 . PRODUCTS: 

In this period, we created a database of PTS+ OEF/OIF/OND Veterans and their demographic 
characteristics and medical record data, their PTS psychotherapy visits and corresponding clinical notes, 
as well as their PCL and suicide measures. We completed, tested, and improved a standardized 
annotation guide specifying each category of treatment to be annotated. We created a classifier to 
separate irrelevant documents from psychotherapy notes, allowing us to distill our text corpus to the 
most important documents. 

7. PARTICIPANTS & OTHER COLLABORATING ORGANIZATIONS 

■ What individuals have worked on the project? 


Name: 

Shira Maguen 

Project Role: 

Principle Investigator, San Francisco VAMC 

Researcher Identifier (e.g. orcid id): 

N/A 

Nearest person month worked: 

3.5 

Contribution to Project: 

Dr. Maguen has provided coordination, oversight, and 
management of all tasks outlined in the research plan, 
working closely with her co-investigators. 
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Name: 

Brian Shiner 

Project Role: 

Co-investigator, White River Junction VAMC 

Researcher Identifier (e.g. orcid id): 

N/A 

Nearest person month worked: 

4 

Contribution to Project: 

Dr. Shiner has helped the team use his natural language 
processing algorithms to identify the use of evidence- 
based psychotherapy for PTS. He has also assisted with 
methods related to this project, given his prior experience 

with NLP. 

Name: 

Erin Madden 

Project Role: 

Statistician 

Researcher Identifier (e.g. orcid id): 

N/A 

Nearest person month worked: 

5 

Contribution to Project: 

Ms. Madden has worked to acguire the data sources used 
for this project, built the initial cohort and derived 
administrative-based datasets for the cohort. 

Name: 

Scott Duvall/Olga Patterson/Corinne Halls 

Project Role: 

NLP Expert, Salt Lake City VAMC 

Researcher Identifier (e.g. orcid id): 

N/A 

Nearest person month worked: 

4 

Contribution to Project: 

Drs. Duvall and Patterson advised the study team on NLP 
methods and are starting to test and modify the 
algorithm. Ms. Halls is overseeing annotation and coding 
for NLP team. 

Name: 

Kristine Burkman 

Project Role: 

Clinical Psychologist Coder 

Researcher Identifier (e.g. orcid id): 

N/A 

Nearest person month worked: 

2 

Contribution to Project: 

Dr. Burkman is a clinical psychologist who has coded 
clinical notes during the performance evaluation phase of 
the study. 

Name: 

Lisabeth Goldstein 

Project Role: 

Clinical Therapist Coder 

Researcher Identifier (e.g. orcid id): 

N/A 

Nearest person month worked: 

2 

Contribution to Project: 

Dr. Goldstein is a postdoctoral fellow who has coded 
clinical notes during the performance evaluation phase of 
the study. 



■ Has there been a change in the active other support of the PD/PI(s) or 
senior/key personnel since the last reporting period? 

Nothing to report. 

■ What other organizations were involved as partners? 

Nothing to report. 
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Study/Product Aim(s) 

•Aim 1: Determine whether Iraq and Afghanistan Veterans that receive 
EBT for PTS across the entire Veterans Administration (VA) 
demonstrate improvement in PTS and suicide symptoms. 

•Aim 2: Determine what percentage of Veterans with PTS complete a 
minimally-adequate dose of EBT for PTS, as well as factors associated 
with treatment completion. 

•Aim 3: Determine the association between treatment profiles (early, 
delayed, and no EBT) and symptom improvement in Veterans with 
PTS, including those with complex comorbidities (depression, TBI, 
substance use disorders, and/or pain disorders). 

Approach 

Retrospective cohort study using multiple sources of VA data from 2007 
to 2014. Sample will include Iraq and Afghanistan Veterans with PTS 
who are new users enrolled in the VA health care system. 

Natural Language Processing (NLP) will be used to determine receipt 
of evidence-based psychotherapy for PTS. 


Natural Language Processing Methods 


Population Identification 


Retrieve Note Text 


Validate Existing Algorithms on 
Study Note Set 


^^^Standards 

Annotation 

YES ~~T 


II 

\ 

Model 

Creation/Modification 

■ 

Performance 

Evaluation 


Automated Coding 


Data Analysis 


Use VA databases to identify specific patients of 
interest 

Pull from CDW 

Load into semi-supervised platform 
Automated coding of a random sample of the note set 
Clinician team annotates the same notes 
Calculate algorithm performance 

Chart review by clinician team 
Random sample of note pool 

Machine learning 
10-fold cross validation 
Recall, performance, f-measures 
as good as kappa 
Review by clinician team 
f • Code remaining notes 
• Resolve dual codes 

Notes describing EBT use 
Judgment becomes numeric variable 


Timeline and Cost 


Goals/Milestones 


Activities 


15 16 


Update and Merge Existing Data and 
Datasets 

Use NLP to Evaluate Clinical Notes 
Data Analysis 

Finalize Study Requirements, Prepare for 
Future Funding, and Disseminate Findings 



Estimated Budget ($780,491 directs) 


Updated: April 2016 (SFVAMC - San Francisco, CA) 


CY15 Goals - Update and Merge Existing Data and Datasets 
0 Update and merge multiple VA datasets 
0 Population identification 
0 Retrieve note text 

0 Begin developing standardized annotation guide 

CY16 Goals - Use NLP to Evaluate Clinical Notes 

0 Complete creation of standardized annotation guide 
0 Quadruple annotation of 650 psychotherapy notes 
0 Build classifier to remove irrelevant notes 

□ Annotation of enriched set of 650 notes 

□ Build NLP model for types of EBT 

□ Automated coding 

CY17 Goals - Complete Data Analysis and Disseminate Findings 

□ Complete data analysis 

□ Finalize study requirements 

□ Prepare for future funding 

□ Disseminate findings 




























